Automatic Paleographic Exploration of Genizah Manuscripts
نویسندگان
چکیده
The Cairo Genizah is a collection containing approximately 250,000 hand-written fragments of mainly Jewish texts discovered in the late 19th century. The fragments are today spread out in some 75 libraries and private collections worldwide, and there is an ongoing effort to document and catalogue all extant fragments. Paleographic information plays a key role in the study of the Genizah collection. Script style, and – more specifically – handwriting, can be used to identify fragments that might originate from the same original work. Such matched fragments, commonly referred to as “joins”, are currently identified manually by experts, and presumably only a small fraction of existing joins have been discovered to date. In this work, we show that automatic handwriting matching functions, obtained from non-specific features using a corpus of writing samples, can perform this task quite reliably. In addition, we explore the problem of grouping various Genizah document by script style, without being provided any prior information about the relevant styles. The results show that the automatically obtained grouping agrees, for the most part, with the paleographic taxonomy. In cases where the system fails, it is due to apparent similarities between related scripts.
منابع مشابه
Computerized Paleography Exploration of Historical Manuscripts
The modern scholar of history or of other disciplines is often faced today with hundreds of thousands of readily-available and potentially-relevant full or fragmentary documents, but without computer aids it is a very hard and usually even impossible task to find the sought-after needles in the proverbial haystack of online images. The Cairo Genizah is a collection containing approximately 250,...
متن کاملAutomatic extraction of catalog data from digital images of historical manuscripts
The Cairo Genizah, discovered in the late 19th century, is a collection of handwritten historical documents containing approximately 350,000 fragments of mainly Jewish texts. The fragments are today spread out in more than seventy libraries and private collections worldwide, and there is an ongoing effort to document and catalog all extant fragments. We explore three levels of extraction of cat...
متن کاملEnriching Digitized Medieval Manuscripts: Linking Image, Text and Lexical Knowledge
This paper describes an on-going project of transcribing and annotating digitized manuscripts of medieval Spanish with paleographic and lexical information. We link lexical units from the manuscripts with the Multilingual Central Repository (MCR), making terms retrievable by any of the languages that integrate MCR. The goal of the project is twofold: creating a paleographic knowledge base from ...
متن کاملSPI: A System for Paleographic Inspections
The main interest in paleographers work is to relate the culture and the writing styles of ancient manuscripts analysing the morphology of scripts. Unfortunately, often experts disagree on the analysis methods. For this reason, an user-indipendent system based on statistical methods can be very helpful for experts on determining which morphological features are relevant for the description of t...
متن کاملCitation and Alignment: Scholarship Outside and Inside the Codex
We describe a hierarchical approach to modeling text that allows machine-actionable canonical citation of text at many levels of specificity. This model address the problem of overlapping or mutually exclusive analyses. In turn, this flexibility in citation allows rich linking of textual transcriptions and other data to regions-of-interest on digital images, of particular value to codicological...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010